All-ferroelectric implementation of reservoir computing

Reservoir computing (RC) offers efficient temporal information processing with low training cost. All-ferroelectric implementation of RC is appealing because it can fully exploit the merits of ferroelectric memristors (e.g., good controllability); however, this has been undemonstrated due to the challenge of developing ferroelectric memristors with distinctly different switching characteristics specific to the reservoir and readout network. Here, we experimentally demonstrate an all-ferroelectric RC system whose reservoir and readout network are implemented with volatile and nonvolatile ferroelectric diodes (FDs), respectively. The volatile and nonvolatile FDs are derived from the same Pt/BiFeO3/SrRuO3 structure via the manipulation of an imprint field (Eimp). It is shown that the volatile FD with Eimp exhibits short-term memory and nonlinearity while the nonvolatile FD with negligible Eimp displays long-term potentiation/depression, fulfilling the functional requirements of the reservoir and readout network, respectively. Hence, the all-ferroelectric RC system is competent for handling various temporal tasks. In particular, it achieves an ultralow normalized root mean square error of 0.017 in the Hénon map time-series prediction. Besides, both the volatile and nonvolatile FDs demonstrate long-term stability in ambient air, high endurance, and low power consumption, promising the all-ferroelectric RC system as a reliable and low-power neuromorphic hardware for temporal information processing.

Ref. [2] Hf0.5Zr0.5O2based FeFET Based on Table S1, two facts regarding the existing ferroelectric-based RC systems can be extracted: (i) The ferroelectric-based RC systems in most of the recent works used ferroelectric tunnel junctions (FTJs) and ferroelectric field-effect transistors (FeFETs) to implement only the reservoirs, while the readout networks were either simulated or implemented with a mature RRAM chip. Only the work in Ref. [3] attempted to use a Hf0.5Zr0.5O2 (HZO)based FeFET to implement the readout network, but this FeFET, identical that used for the reservoir, was indeed volatile and failed to meet the functional requirements of the readout network. S-4 (ii) The ferroelectric-based RC systems in most of the recent works were demonstrated at the device level. Only the work in Ref. [1] reported a circuit-level demonstration, but the readout network was implemented with RRAM rather than ferroelectric devices.
From the above two facts, it is noted that so far there has been no demonstration of an all-ferroelectric RC system (particularly at the circuit level). This is disappointing because the all-ferroelectric RC system promises higher performance and robustness than the existing RRAM-based RC systems (see reasons in the Introduction section of the main text). In this work, we develop an all-ferroelectric RC system, where the reservoir and readout network are implemented with the volatile and nonvolatile FDs, respectively, and demonstrate its temporal signal processing capability at the circuit level (albeit with simple tasks). Therefore, our work indeed realizes an all-ferroelectric RC system for the first time.
Table S1 also reveals several advantages of our all-ferroelectric RC system, as summarized below:

(a) The use of FDs as building blocks
As shown in Table S1, only our work uses FDs as the building blocks of an RC system, while previous works used FTJs and FeFETs. As explained in detail in the Introduction section of the main text, FTJ and FeFET possess inherently large depolarization fields (Edps), making them voluntary to exhibit volatile characteristics while difficult to be engineered into nonvolatile memristors to implement the readout network. By contrast, FD is inherently subjected to a much smaller Edp, and hence it can readily function as a nonvolatile memristor. In addition, by judiciously introducing an Eimp without changing S-5 the device structure, the FD is engineered to be volatile. The volatile and nonvolatile FDs are further used to implement the reservoir and readout network, respectively, eventually forming a well-functioning RC system.
Using FDs as the building blocks is therefore the key to the successful implementation of the all-ferroelectric RC system, which would provide great inspiration for researchers working on ferroelectric-based neuromorphic computing. Additionally, it is noteworthy that the derivation of the volatile and nonvolatile FDs from the same device structure has not been realized in other types of ferroelectric memristors. Table S1 highlights that the polarization dynamics of our volatile FD are clearly revealed. However, the polarization dynamics of previously used FTJs and FeFETs (for reservoirs) remain largely unclear, because it is difficult or even impractical to directly measure polarization in these devices owing to either large leakage current or poor charge screening. By contrast, FD allows the direct polarization measurement because of the suppressed leakage current (arising from the relatively thick ferroelectric film and the reverse-biased Schottky barrier) and the good charge screening provided by the metal electrodes [6]. Polarization dynamics of our volatile FD have thus been systematically  Figure 3(b) in the main text and Figure S9], which could greatly enrich the polarization dynamics.

S-6
Table S1 also presents that the richness of polarization dynamics of our volatile FD is in principle the highest, as explained as follows. For the Hf0.5Zr0.5O2 (HZO)-based FeFET [2], it exhibits only the nonlinear, history-dependent polarization switching behavior, which is common for all the ferroelectric devices. It therefore has the lowest richness of polarization dynamics. For other FTJs [1] and FeFETs [3][4][5], although the Edp-induced polarization decay was claimed to exist in these devices, it was not unambiguously revealed  Figure S9(c)]. Therefore, the richness of polarization dynamics of our volatile FD is in principle higher than those of previous devices with Edp.

(c) Low power consumption
Table S1 compares the power consumptions of different ferroelectric memristors used for reservoirs. Our volatile FD exhibits the lowest power consumption of ~11.8 μW, well due to the relatively low operation voltage and suppressed leakage current, as mentioned earlier. Such power consumption is even at least 3 times lower than those of the state-ofthe-art filamentary memristors for reservoirs [7][8][9]. Also note that the power consumption of our nonvolatile FD for readout network is even lower, reaching ~140 nW. The details about the power consumption estimation can be found in Supplementary Note 4.

S-7
In terms of the prediction performance, although the counterparts in Ref. [1] and [3] achieve higher accuracies than our all-ferroelectric RC system in the MNIST handwritten digit recognition task (see Table S1), they use either multilayer readout networks [1] or more complex pre-processing [3], making the direct comparison between these accuracies unfair. In fact, the 89.5% accuracy achieved by our all-ferroelectric RC system is 6.5% higher than that achieved by a pioneering diffusive memristor-based RC system [10], whose image pre-processing approach, reservoir architecture, and readout network size are similar to ours.
In addition, our all-ferroelectric RC system achieves an ultralow NRMSE value of 0.017 in the Hénon map time-series prediction (see Figure 6 in the main text), which also underscores its high prediction performance.
S-8 Figure S1. XRD θ-2θ scans of the BFO films grown on the SRO-buffered (001)-oriented STO substrates under oxygen pressures of 15 Pa and 19 Pa.
Only (00l) diffraction peaks from BFO, SRO, and STO are observed in both of the two BFO films, evidencing their phase purity. In addition, according to the positions of the BFO (00l) peaks, one can conclude that both BFO films exhibit a rhombohedral-like phase [11].
S-9 Both of the BFO films exhibit a step-bunching morphology, which is a typical feature of relatively thick epitaxial BFO films grown on SRO-buffered (001)-oriented STO substrates with no miscut [12]. In addition, the surfaces of both films are rather flat with root-mean-square roughness values smaller than ~1.2 nm.
S-10   As shown in the inset of Figure S4(a) [ Figure S4(b)], the device is first set in the Pup suggests that the dominant conduction mechanism in the device is interface-limited rather than bulk-limited. Two typical interface-limited conduction models, namely, Schottky emission and Fowler-Nordheim (FN) tunneling, were mainly considered in this work.

S-17
The FN tunneling was first attempted for fitting (results not shown). However, the fitted values of barrier heights are unreasonably small (on the order of 0.001 eV), thus excluding the FN tunneling as the dominant conduction mechanism in the Pt/BFO/SRO device.
Then, the I-V curves were fitted by using the Schottky emission model given below [13]: where A is the electrode area, A * is the Richardson constant, T is the absolute temperature, q is the electron charge, k is the Boltzmann constant, Neff is the effective charge density, ε0 is the vacuum permittivity, εop and εst are the optical and static dielectric constants of the ferroelectric layer, respectively, and Vbi′ is the apparent built-in potential. Here, the εop and εst values of BFO used for fitting are 6.25 [14] and 60 [15], respectively.
The fitting results are shown as the black lines in Figure S6. It is seen that all the ln|I|-(|V|+Vbi′) 1/4 curves agree well with the fitting lines in certain voltage ranges of interest.
Here, the voltage range of interest covers the voltages that are around the read voltage (e.g., ±1 V) and below the critical voltage for resistive switching. The good fits suggest that for the device in both Pup and Pdown states the Schottky emission may be the dominant conduction mechanism in these voltage ranges.
The BFO film is typically a p-type semiconductor due to the Bi loss [16], and the ptype character of our BFO film has been demonstrated recently [6]. Based on the intercepts of the fitting lines on the ln|I| axis, the heights of the BFO/SRO and Pt/BFO barriers can be extracted. Specifically, using the intercepts in Figure S6 switching-induced barrier height change causes the resistive switching. We can now illustrate the resistive switching mechanism in more detail by using the energy band diagrams.
As mentioned earlier, our BFO film is a p-type semiconductor and two p-type Schottky barriers may be formed at both the Pt/BFO and BFO/SRO interfaces. Given that the device is subjected to an initial voltage sweep of +Vmax → 0, it is thus set in the initial Pdown state.
Then, as the voltage sweeps from 0 to -Vset, the BFO/SRO barrier is reverse-biased and thus limits the hole injection. Because the positive polarization charge at the BFO/SRO barrier enhances the barrier height, the HRS is obtained [see Figure S7 Therefore, the mechanism for the switchable diode-type resistive switching has been well explained, and it agrees with those reported previously [16,19]. Also note that the polarization switching-induced barrier height modulation, i.e., the key to the proposed resistive switching mechanism, has been proved by the fitting results in Figure S6. The method of the polarization retention measurement has been described in Figure   S4. Figure S9(a) shows that all the negative monopolar P-V loops exhibit an "S" shape, which is a fingerprint of polarization switching. Moreover, the switched polarization becomes larger with increasing delay period. It is therefore deduced that the polarization back-switching occurs in the Pup state, and the back-switched polarization increases with the delay period. By contrast, Figure S9(b) shows that the positive monopolar P-V loops at different delay periods all exhibit very small hysteresis windows and they almost overlap, suggesting that almost no polarization back-switching occurs in the Pdown state.

S-23
We further plot the ±Pr as a function of delay period in Figure  It is therefore concluded that the Eimp is stable against electric field (including both pulsed and DC electric fields) at room temperature. Note that this conclusion is valid throughout this study because the pulsed and DC voltages applied in this study are no larger than ±4 V and ±3 V, respectively. Further enhancing the electric field may invalidate this conclusion, which is out of the scope of this study.

S-25
Figure S11. P-V loops of the Pt/BFO (19 Pa)/SRO device before and after leaving it in the ambient air at room temperature for 30 days.
The P-V loops of the device before and after a 30-day exposure to the ambient air almost overlap, suggesting that the Eimp almost does not change after a long duration. It is therefore concluded that the Eimp is stable over time at room temperature. To evaluate the temperature stability of Eimp, the initial P-V loop of the Pt/BFO (19 Pa)/SRO device was first measured at room temperature. Then, the device was set to the Pup state and then annealed at 250 o C for 60 min in air. After this treatment, the P-V loop exhibits a positive voltage offset (see Figure S12), suggesting that the direction of Eimp is now pointing upward, opposite to that of the initial Eimp.

S-26
Therefore, Eimp can be changed at high temperature. Nevertheless, because all the electrical measurements in this study were performed at room temperature, the downward Eimp in the 19 Pa film can be considered as relatively stable.
Note that the origin for Eimp and why Eimp changes at high temperature are explained in the discussion on Figure S13. The XPS results can be interpreted as follows. binding energy relative to those measured in the bulk regions (i.e., at the ~35 and ~70 nm depths). This peak shift is indeed not very small considering that the upper limit of the peak shift (i.e., the binding energy difference between the Fe 2+ and Fe 3+ 2p3/2 peaks) is only ~1.3 eV [20]. In addition, this peak shift is still greater than the energy resolution of our XPS system, i.e., 0.05 eV [21,22]. It may thus be safe to deduce that the Fe valence is lower at the surface than in the bulk regions.
By further comparing Figures S13(a) and (b), it is revealed that the average peak position of the 15 Pa film is at a lower binding energy compared with that of the 19 Pa film, implying that the average Fe valance of the former is lower than that of the latter. Pa film. This agrees with the fact that a lower oxygen pressure used for film growth can lead to a larger amount of oxygen vacancies.
Besides the amounts of oxygen vacancies, the distributions of oxygen vacancies are also different in these two films. As shown in Figure S13 S-29

(b) Possible origin for the difference in oxygen vacancy distribution
To explain the difference in oxygen vacancy distribution, we conjecture that during the growth of a BFO film, a sufficiently high oxygen pressure on the surface may cause the bulk oxygen vacancies to migrate to the surface (using the oxygen pressure gradient as the driving force) [23]. However, when the oxygen pressure on the surface is relatively low, the bulk oxygen vacancies may still remain in the bulk due to the insufficient oxygen pressure gradient.

(c) Correlation between oxygen vacancies and Eimp
For the 15 Pa film, the uniform distribution of oxygen vacancies throughout the film can explain the absence of Eimp.
For the 19 Pa film, the preferable distribution of oxygen vacancies near the surface may be the origin for the downward Eimp. It is known that the mobility of oxygen vacancies is very low at room temperature [24], which can explain why Eimp is considerably stable against electric field and time at room temperature (see Figures S10 and S11). However, at high temperature the oxygen vacancies become more mobile [25], and they may migrate toward the bottom interface to compensate the negative polarization charge when the Pup state is established beforehand [6,16]. This can explain why the direction of Eimp is reversed after the thermal treatment (see Figure S12). polarization switching and the critical voltage for resistive switching, as described in Figure   S5. The method for the I-V curve fitting has been described in detail in Figure S6

S-34
As the voltage sweeps from 0 to -Vmax and back to 0, the BFO/SRO barrier is reversebiased and thus limits the hole injection. The evolution of the BFO/SRO barrier height has been described in Figure S7. In brief, the BFO/SRO barrier exhibits a relatively large height in the Pdown state [ Figure S16 As has been demonstrated in Figure S15, the volatile FD exhibits the polarizationcontrolled Schottky emission in the negative voltage region. In addition, Figure S9  Figure S18(a) shows that the read current gradually increases during the first 3 write pulses due to the short interval (9 ms). However, the read current reduces when the interval between the 3rd and 4th write pulses (Δt34) increases to 1000 ms. Moreover, as seen from Since the linear resistor has no memory effect, the last pixel in a row of a digit (i.e., the amplitude of the last pulse in the corresponding pulse train) determines the final current response of the linear resistor. As a result, it is seen from Figure S23(b) that the linear resistors corresponding to different rows produce identical final current responses when the digits "0", "4", "8", and "9" from the training set [see Figure 5(a) in the main text] are presented. This means that the reservoir states corresponding to the digits "0", "4", "8", and "9" are the same, making these digits indistinguishable. Similarly, the digits "5" and "6" from the training set are also indistinguishable. The linear resistor-based reservoir is therefore ineffective.

S-42
As a comparison, it is shown in Figure 5(d) in the main text that the volatile FD-based reservoir can produce well distinguishable reservoir states for the 10 digits in the training set. The compared results of Figure S23(b) and Figure 5(d) in the main text therefore demonstrate that the volatile FD plays an essential role in our reservoir. obtained from the all-ferroelectric RC system versus the target labels.

S-43
All the images are pre-processed, as schematically illustrated in Figure S25 Each row is further chopped into 5 sections, and each section is converted to a 4-timeframe pulse train. The pulse amplitude is −2.7 V (0 V) when the pixel value is 1 (0), and the pulse width is fixed at 2 ms.

S-45
After the pre-processing, each image is converted to 22 × 5 pulse trains, which are subsequently fed to a reservoir consisting of 22 volatile FDs. One volatile FD is responsible for processing 5 pulse trains. After each pulse train the device's final conductance state is read out and then its conductance is reset to the initial value through a reset pulse. 110 (22 As an illustrative example, Figure S25(b) shows that the reservoir states corresponding to the digits "3", "5", and "8" are distinctly different, evidencing the effectiveness of the volatile FD-based reservoir. Figure S25(c) shows the confusion matrix produced by our all-ferroelectric RC system on the test set. Most of the digits are correctly classified, and the recognition accuracy reaches 89.5%. This accuracy is 6.5% higher than that achieved by a pioneering diffusive memristor-based RC system [10], whose image pre-processing approach, reservoir architecture, and readout network size are similar to ours. Note that several studies reported even higher accuracies. However, they typically used more complex pre-processing approaches [3,8], different reservoir architectures [27], or multilayer readout networks [1], making it unfair to directly compare their accuracies with ours. S-46

Supplementary Note 1. Dataset for the curvature discrimination task
In the curvature discrimination task, the original curve follows the equation: where a is a parameter which can be varied. The curve is then rotated around the origin by an angle of θ (θ is a variable parameter). Afterward, the curve is shifted horizontally and vertically by p and q, respective, where p and q are also variable parameters. The horizontal coordinate interval of the curve is set to be [-4, 4]. By varying the parameters a, θ, p, and q (see Table S2), 138 different curves are generated. 102 curves are then selected from the whole 138 curves and constitute the training set [see Figure S20(a)], while the rest 36 curves constitute the test set [see Figure S20(b)].

Supplementary Note 2. Possible factors influencing the performance of curvature discrimination
In the curvature discrimination experiment, the pre-processing mainly includes 2 steps: 1) chopping each curve into 3 sections, and 2) converting each section to a 3-timeframe pulse train. The beginning, middle, and end sections of a curve are therefore represented by 3 pulse trains, respectively. The 3 pulse trains are then applied to a reservoir consisting of 3 volatile FDs, with each device processing one pulse train.
As seen above, the key function of the pre-processing is converting the spatial information in the curve into the temporal features in the streaming inputs. How the curve is pre-processed thus affects the temporal features to be extracted by the volatile FDs, which in turn influences the classification accuracy.
For example, if the curve is not chopped, one 9-timeframe pulse train is sufficient to represent this curve (assuming that the product of the number of pulse trains and that of timeframes is 9). Correspondingly, only one volatile FD is used to process this pulse train.
Due to the short-term memory of the volatile FD, only the spatial information near the end of the curve can be well captured while that in the beginning and middle may be lost, resulting in a poor accuracy. On the other hand, if the curve is chopped into too many sections, e.g., 9 sections, 9 1-timeframe pulse trains are needed to represent this curve (also assuming that the product of the number of pulse trains and that of timeframes is 9).
Correspondingly, 9 volatile FDs are used to process the 9 pulse trains. Because each pulse train has only 1 timeframe, it is unable to use the memory effect of the volatile FDs. The reservoir in this case is ineffective.

S-61
Therefore, to take full advantage of the volatile FD-based reservoir, we have chopped the curve into an appropriate number of sections (i.e., 3 sections), and converted each section to a pulse train with an appropriate number of timeframes (i.e., 3 timeframes).
Similar way of pre-processing has been widely used for physical RC systems when handling spatial pattern recognition tasks [27,28].
Although the pre-processing has certain effects on the RC performance, we argue that it plays an auxiliary role. The volatile FD-based reservoir indeed plays the essential role because it is responsible for extracting temporal features. This can be evidenced through control experiments as follows. In where the input x is the y-coordinate of the point on the input curve, and t is a parameter which can be optimized. Note that for different tasks, x and t can be changed accordingly.

S-62
As shown in Figure 4f in the main text, both the linear resistor-and sigmoid-based RC systems achieve the same accuracy of 83.3% on the test set. The same accuracy of the two control systems may be caused by the fact that both the linear resistor and sigmoid function have no memory effect. In addition, the same accuracy in turn suggests that the nonlinearity of the sigmoid function does not contribute to the RC performance in this simple task. Figure 4f in the main text also presents that the accuracies of the two control systems are apparently lower than the 100% accuracy of the volatile FD-based RC system (i.e., the all-ferroelectric RC system). To understand why the all-ferroelectric RC system outperforms the two control systems, typical misclassified results obtained from the two control systems are specifically analyzed. Figure S21 which is well attributed to its memory effect allowing the pulse history to influence the conductance. This in turn results in a correct classification given the readout weights trained offline, as shown in Figure S21(d).

S-63
The above compared results therefore confirm that the volatile FD-based reservoir is the key for the superior performance of our RC system in the curvature discrimination.
Besides, the nonvolatile FDs, which provide multilevel nonvolatile conductance states for the mapping of readout weights, are also important for the RC performance.

Supplementary Note 3. Possible factors contributing to the high performance in the Hénon map prediction
In the experiment of Hénon map prediction, there are several possible factors which may influence the performance, such as the volatile FD-based reservoir, nonvolatile FDbased readout network, pre-processing, and post-processing. These factors are analyzed in detail as follows.

(a) Role of post-processing
The readout network was trained with linear regression in this experiment. No nonlinear post-processing was performed. There is thus little contribution from the postprocessing to the RC performance, which can be evidenced by a control experiment presented in Section (b).

(b) Role of pre-processing (linear signal conversion)
The pre-processing mainly involves two processes: 1) the mask process which can generate virtual nodes, and 2) the linear conversion of input signals to pulse voltages. The role of the mask process will be discussed later. The linear conversion of input signals to pulse voltages is a widely used way to pre-process the time-series data [7,8], and such linear signal conversion may contribute little to the RC performance [29].
To confirm the minor roles played by the pre-processing of linear signal conversion and the post-processing of linear regression, we have performed a control experiment. The input signals x(n) and x(n -1), after the linear signal conversion, were directly fed to a readout network which was trained by the linear regression. Neither mask process nor reservoir was used. The predicted time series are shown in Figure S28(b), which deviate S-65 significantly from their corresponding ideal targets. The NRMSE value on the test set is further calculated to be 0.98, which is extremely high. These results demonstrate that the pre-processing of linear signal conversion and the post-processing of linear regression are minor factors contributing to the RC performance.

(c) Role of mask process
The mask process is known to be capable of improving the RC performance since it can generate virtual nodes to effectively expand the reservoir size [7,8]. To demonstrate this in our RC system, the RC performance was investigated with varying number of masks (nm) and mask length (lm). In the experiment of Hénon map prediction, each mask sequence is processed by one volatile FD, generating lm virtual nodes. The reservoir size, or the total number of virtual nodes, is thus lm × nm. Figure S29(a) shows that NRMSE decreases with increasing nm (lm fixed at 3). This is within expectation because the reservoir size becomes larger when using more masks.
More reservoir states are thus generated, which can help to better capture the features of the input signals. In addition, as more masks are used, more volatile FDs are also used correspondingly. The device-to-device variation of the volatile FDs can help to expand the effective reservoir size, which is an additional factor contributing to the decrease of NRMSE.
Then, the effect of mask length lm was investigated by varying lm while fixing lm × nm at 24. Figure S29( [8]. This is the main cause for the rise of NRMSE with increasing M. In addition, the role of the device-to-device variation becomes weaker as lm increases (given that lm × nm is fixed at 24, a larger lm leads to a smaller nm).
This is an additional factor contributing to the rise of NRMSE.
As Figure S29(b) shows that lm = 3 is the optimal mask length under the constraint of lm × nm = 24, we therefore report the results obtained at lm = 3 and nm = 8 in the main text.
However, it should be noted that the optimal value of lm is task-dependent, and it may change if different pulse parameters and different constraints of lm × nm are used.
Although the above results have shown that the mask process contributes to the RC performance, it indeed plays an auxiliary role. The decisive factors are actually the device characteristics of the volatile and nonvolatile FDs, as demonstrated as follows.

(d) Roles of device characteristics of volatile and nonvolatile FDs
For an RC system, the nonlinearity and short-term memory of the reservoir and the multilevel nonvolatile weights of the readout network are of critical importance to the performance. These functional requirements of the reservoir and readout network are well fulfilled by our volatile and nonvolatile FDs, respectively.
For the nonvolatile FD, it is engineered to be free from Eimp, so it can exhibit good polarization stability and consequent nonvolatile memristive switching. The multilevel nonvolatile conductance states (>4 bits) can be used to precisely map the weights in the readout network, which is apparently critical for the RC performance.

S-67
On the other hand, the volatile FD with purposely introduced Eimp is used to implement the reservoir. Thanks to the complex polarization dynamics (including the nonlinear, history-dependent polarization switching under external field and spontaneous polarization back-switching induced by Eimp), the volatile FD well exhibits the nonlinearity and short-term memory as required by a reservoir. To demonstrate that the volatile FDbased reservoir is essential to the RC performance, we replaced the volatile FDs in the reservoir with linear resistors and sigmoid functions (without changing other factors like the mask process and the readout network) and investigated how the performance would change. As shown in Figure S28 Note that in the control RC systems using linear resistor-and sigmoid-based reservoirs, although the mask process (lm = 3 and nm = 8) is used, the performance is still poor. This is because the linear resistor and sigmoid function have no memory effect, causing the virtual nodes generated by the mask process to be independent of each other. Hence, the linear resistor-and sigmoid-based reservoirs are unable to capture the features in the temporal inputs, even though the mask process is used. This in turn suggests that the mask process alone could not result in good RC performance.

S-68
In fact, the mask process plays an auxiliary role, i.e., expanding the reservoir size. It works only under the condition that the devices in the reservoir (like our volatile FDs) possess nonlinearity and short-term memory. With such device characteristics, the virtual nodes are nonlinearly coupled, and the current state of a virtual node depends on its own previous state, the current state of its neighboring nodes, and the input signal applied to this node. The reservoir is hence capable of capturing both local temporal features within an input window and more global features among input windows, leading to good RC performance.

(e) Physical mechanisms underlying the device characteristics
As demonstrated above, the good performance of our all-ferroelectric RC system in the Hénon map prediction can be attributed to the following factors: the device characteristics of the volatile and nonvolatile FDs (essential) and the mask process (auxiliary). Below we will further analyze the physical mechanisms underlying the device characteristics, so as to gain deeper insights into why the performance is so good.
As shown in Eqs. (1) and (2)  voltage [2]. Additionally, the domains in the volatile FD are observed to be tiny and irregularly-shaped ( Figure S3), which typically results in a wide distribution of switching voltages. This could allow the nonlinear polarization switching to occur in a relatively wide range of voltages, covering the voltages applied in the Hénon map prediction task (the applied pulse voltages are within -2 V, and -2 V is around the coercive voltage where strong nonlinearity exists). Besides the nonlinear polarization switching, the nonlinear polarization-controlled conduction behavior, where the current is nonlinearly dependent on the polarization-controlled Schottky barrier height and the applied voltage [30], further adds to the device's nonlinearity.
In terms of the short-term memory in our volatile FD, it mainly originates from both the history dependence of polarization switching and the spontaneous polarization backswitching induced by Eimp. Some previously reported ferroelectric devices used only the history dependence of polarization switching to realize the short-term memory [2]; however, the memory effect may degrade or even disappear when the polarization is approaching saturation. This issue does not exist in our volatile FD because it also exhibits the spontaneous polarization back-switching induced by Eimp besides the history dependence of polarization switching. These complex polarization dynamics lead to a short-term memory effect with time constants in the millisecond scale. Accordingly, the pulse intervals used in the Hénon map prediction task are designed to match well with the time constants, ensuring the effectiveness of the short-term memory.
On the other hand, the nonvolatile FD-based readout network is also an important part of our RC system. The nonvolatile FD exhibits good polarization stability because of the absence of Eimp. Owing to this and the polarization-controlled conduction behavior, the S-70 nonvolatile FD exhibits multilevel nonvolatile conductance states (>4 bits), which can be used to precisely map the weights in the readout network. This is the major contribution from the nonvolatile FD to the RC performance.
In short, the device characteristics contributing to the good performance in the Hénon respectively. Such power consumption is at least 3 times lower than those of the state-ofthe-art filamentary memristors used for RC hardware systems [7][8][9]. However, the energy consumption of the volatile FD is relatively high, due to the large pulse width (i.e., 10 ms) as limited by our test board system. It is thus quite promising to boost the energy efficiency of the volatile FD by reducing the pulse width.
On the other hand, the input pulses applied to the readout network have an average amplitude of ~0.7 V and a width of 1 ms. The output currents of the nonvolatile FDs are ~200 nA in average. The power and energy consumptions of the nonvolatile FD are thus roughly estimated to be ~140 nW per input and ~140 pJ per input, respectively. Both the power and energy consumptions of the nonvolatile FD are very small, much lower than those of the volatile FD. Note that in the curvature discrimination task the nonvolatile FDs only performed inference after programming, and thus the programming energy was not considered.